Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bytes decoding from UTF-16 #1946

Merged
merged 8 commits into from
Jan 6, 2025
Merged

Add bytes decoding from UTF-16 #1946

merged 8 commits into from
Jan 6, 2025

Conversation

bbannier
Copy link
Member

Closes #1788.

@bbannier bbannier self-assigned this Dec 10, 2024
@bbannier bbannier force-pushed the topic/bbannier/issue-1788 branch 3 times, most recently from e78510a to 8711bff Compare December 16, 2024 12:46
@bbannier bbannier marked this pull request as ready for review December 16, 2024 13:09
@bbannier bbannier requested a review from rsmmr December 16, 2024 13:09
hilti/lib/hilti.hlt Outdated Show resolved Hide resolved
spicy/lib/spicy.spicy Outdated Show resolved Hide resolved
hilti/runtime/src/types/string.cc Outdated Show resolved Hide resolved
hilti/runtime/src/types/string.cc Outdated Show resolved Hide resolved
hilti/runtime/src/types/string.cc Outdated Show resolved Hide resolved
hilti/runtime/src/types/bytes.cc Outdated Show resolved Hide resolved
hilti/runtime/src/types/bytes.cc Outdated Show resolved Hide resolved
hilti/runtime/src/types/bytes.cc Show resolved Hide resolved
hilti/runtime/src/types/bytes.cc Outdated Show resolved Hide resolved
@bbannier bbannier force-pushed the topic/bbannier/issue-1788 branch from 44df820 to 359c3f5 Compare December 17, 2024 11:18
@bbannier bbannier requested a review from rsmmr December 17, 2024 11:18
rsmmr
rsmmr previously approved these changes Dec 17, 2024
Copy link
Member

@rsmmr rsmmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing I'm still missing is HILTI-level, i.e., use UTF16 in tests/hilti/types/bytes/decode.hlt and tests/hilti/types/string/encode.hlt. Otherwise looks good.

hilti/runtime/src/types/bytes.cc Show resolved Hide resolved
This was already exposed as a single type in Spicy and HILTI anyway. We
also move `Charset` into the `unicode` namespace.
This allows a cleaner separation between `Bytes` as "bags of bytes" and
`string` as "valid UTF8". Having such a clean separation will make
adding support for more encodings less duplicative.
This parameter defaults to `DecodeErrorStrategy::REPLACE` like the
previous implicit parameter used in the implementation.
This adds two new charsets `UTF16LE` and `UTF16BE` for little and big
endian UTF16 respectively.

We also clean up use of the Unicode replacement character to make it
work consistently between UTF16 and UTF8.

Closes #1788.
By passing in a `const std::string&` we would have incurred construction
costs of an owning string when all we needed was a non-owning view.

We also clean up the implementation of `startsWith` to use standard
library functionality.
@bbannier bbannier force-pushed the topic/bbannier/issue-1788 branch from 8dc1550 to 9a773ce Compare December 20, 2024 12:00
@bbannier bbannier requested a review from rsmmr December 20, 2024 13:29
@bbannier bbannier merged commit 77649cd into main Jan 6, 2025
20 checks passed
@bbannier bbannier deleted the topic/bbannier/issue-1788 branch January 6, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

decode() on bytes should support UTF-16
2 participants